40 research outputs found

    Detecting replay attacks in audiovisual identity verification

    Get PDF
    We describe an algorithm that detects a lack of correspondence between speech and lip motion by detecting and monitoring the degree of synchrony between live audio and visual signals. It is simple, effective, and computationally inexpensive; providing a useful degree of robustness against basic replay attacks and against speech or image forgeries. The method is based on a cross-correlation analysis between two streams of features, one from the audio signal and the other from the image sequence. We argue that such an algorithm forms an effective first barrier against several kinds of replay attack that would defeat existing verification systems based on standard multimodal fusion techniques. In order to provide an evaluation mechanism for the new technique we have augmented the protocols that accompany the BANCA multimedia corpus by defining new scenarios. We obtain 0% equal-error rate (EER) on the simplest scenario and 35% on a more challenging one

    Detecting replay attacks in audiovisual identity verification

    Get PDF
    We describe an algorithm that detects a lack of correspondence between speech and lip motion by detecting and monitoring the degree of synchrony between live audio and visual signals. It is simple, effective, and computationally inexpensive; providing a useful degree of robustness against basic replay attacks and against speech or image forgeries. The method is based on a cross-correlation analysis between two streams of features, one from the audio signal and the other from the image sequence. We argue that such an algorithm forms an effective first barrier against several kinds of replay attack that would defeat existing verification systems based on standard multimodal fusion techniques. In order to provide an evaluation mechanism for the new technique we have augmented the protocols that accompany the BANCA multimedia corpus by defining new scenarios. We obtain 0% equal-error rate (EER) on the simplest scenario and 35% on a more challenging one

    Amélioration d'un codeur de parole à très bas débit par indexation d'unités de taille variable

    Get PDF
    L'objectif de cet article est de démontrer la faisabilité du codage de la parole à très bas débit (environ 400 bits/s) par indexation d'unités de parole naturelles de taille variable, avec une bonne qualité de parole restituée. L'approche ALISP (Automatic Language Independent Speech Processing), telle qu'elle est décrite dans la thèse de Jan CERNOCKY, permet d'atteindre ces débits. Elle souffre toutefois d'un certain nombre de défauts qui limitent la qualité de la parole reproduite. Nous proposons quelques solutions alternatives, pour la segmentation du signal à coder, la recherche des unités de synthèse, et la concaténation de ces unités, pour améliorer cette qualité

    Recurrent lateral inhibitory spiking networks for speech enhancement

    Get PDF
    Automatic speech recognition accuracy is affected adversely by the presence of noise. In this paper we present a novel noise removal and speech enhancement technique based on spiking neural network processing of speech data. The spiking network has a recurrent lateral topology that is biologically inspired, specifically by the inhibitory cells of the cochlear nucleus. The network can be configured for different acoustic environments and it will be demonstrated how the connectivity results in enhancement of temporal correlation between similar frequency bands and removal of uncorrelated noise sources. Demonstration of the speech enhancement capability will be provided with data taken from the TIMIT database with different levels of additive Gaussian white noise. Future directions for further development of this novel approach to noise removal and signal processing will also be discussed

    Towards conversational technology to promote, monitor and protect mental health

    Get PDF
    This paper presents a general overview of the H2020-MSCA-RISE project MENHIR (Mental health monitoring through interactive conversations), which aim is to explore the possibilities of conversational technologies (chatbots) to understand, promote and protect mental health and assist people with anxiety and mild depression manage their conditions. MENHIR started on February 2019 and will have a duration of 4 years. Its consortium brings together 8 partners including universities, anon-profit organization and companies

    Comparing decision fusion paradigms using k-NN based classifiers, decision trees and logistic regression in a multi-modal identity verification application

    No full text
    The contribution of this paper is threefold: (1) to formulate a decision fusion problem encountered in the design of a multi-modal identity verification system as a particular classification problem, (2) to propose three simple classifiers to solve this problem, (3) to compare the relative performances of the proposed classifiers. The multi-modal identity verification system under consideration is built of d modalities in parallel, each one delivering as output a scalar number, called score, stating how well the claimed identity is verified. A fusion module receiving as input the d scores has to take a binary decision: accept or reject identity. This fusion problem has been solved using three different classifiers, respectively based on the k-nearest- neighbor (k-NN) classifier, decision trees and logistic regression. The performances of these different fusion modules have been evaluated and compared on a multi-modal database, containing both vocal and visual modalities. Keywords: ..
    corecore